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Abstract 

In this paper, we use known camera motion associated to a video 
sequence of a static scene in order to estimate and incrementally refine 
the surrounding depth field. We exploit the SO(3)-invariance of bright- 
ness and depth fields dynamics to customize standard image processing 
techniques. Inspired by the Horn-Schunck method, we propose a SO (3)- 
invariant cost to estimate the depth field. At each time step, this provides 
a diffusion equation on the unit Riemannian sphere that is numerically 
solved to obtain a real time depth field estimation of the entire field of 
view. Two asymptotic observers are derived from the governing equations 
of dynamics, respectively based on optical flow and depth estimations: im- 
plemented on noisy sequences of synthetic images as well as on real data, 
they perform a more robust and accurate depth estimation. This approach 
is complementary to most methods employing state observers for range 
estimation, which uniquely concern single or isolated feature points. 

1 Introduction 

Many vision applications are aimed at assisting in interacting with the environ- 
ment. In military as well as in civilian applications, moving in an environment 
requires topographical knowledge: either in order to avoid obstacles or to engage 
targets. Since this information is often inaccessible in advance, the real-time 
computation of a 3D map is a goal that has kept the research community busy 
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for many years. For example, environment reconstruction is tightly related to 
the SLAM problem pQ, which is addressed by nonlinear filtering of observed 
key feature locations (e.g. [H [3]), or by bundle adjustment [H [5]. However, 
estimating a sparse point cloud is often insufficient, yet the transition between 
a discrete local distribution of 3D locations to a continuous depth estimation 
of the surroundings is an ongoing research topic. Dynamical systems provide 
interesting means for incrementally estimating depth information based on the 
output of vision sensors, since only the current estimates are required, and image 
batch processing is avoided. 

For our work, we are interested in recovering in real-time the depth field 
around the carrier under the assumptions of known camera motion and known 
projection model for the onboard monocular camera. The problem of designing 
an observer to estimate the depth of single or isolated keypoints has raised a 
lot of interest, specifically in the case where the relative motion of the carrier 
is described by constant known [BJ [7], constant unknown [5] or time- varying 
known [21 OH [HI HH 13] affine dynamics. From a different perspective, the 
seminal paper of P3] performs incremental depth field refining for the whole 
field of view via iconic (pixel-wise) Kalman filtering. Video systems, typically 
found on autonomous vehicles, have successfully used this approach for refining 
the disparity values obtained by stereo cameras in order to estimate the free 
space ahead [15]. Average optical flow estimations over planar surfaces have 
also been used for terrain following, in order to stabilize the carrier at a certain 
pseudo-distance [16]. Yet, none of these methods provide an accurate dense 
depth estimation in a general setting concerning the environment and the camera 
dynamics. 

We propose a novel frame of methods relying on a system of partial dif- 
ferential equations describing the S'0(3)-invariant dynamics of the brightness 
perceived by the camera and of the depth field of the environment. Based on 
this invariant kinematic model and the knowledge of the camera motion, these 
methods provide dense estimations of the depth field at each time-step and 
exploit such S'0(3)-invariance. 

The present paper is structured as follows. The invariant equations governing 
the dynamics of the brightness and depth fields are recalled in section [2] and 
their formulation in pinhole coordinates is given. In section [3] we adapt the 
Horn-Schunck algorithm to a variational method providing depth estimation. 
In section HI we propose two asymptotic observers for depth field estimations: 
the first one is based on standard optical flow measures and the second one 
enables the refinement of rough or inaccurate depth estimations; we prove their 
convergence under geometric assumptions concerning the camera dynamics and 
the environment. In section [5J we test these methods on synthetic data and 
compare their accuracy, their robustness to noise and their convergence rate; 
tested on real data, this approach gives promising results. 



2 The 5'0(3)-invariant model 



2.1 The partial differential system on S 2 

The model is based on geometric assumptions introduced in |17j . We consider 
a spherical camera, whose motion is known. Linear and angular velocities v(t) 
and u>{t) are expressed in the camera frame. Position of the optical center in 
the reference frame 1Z is denoted by C (t) . Orientation versus TZ is given by the 
quaternion q(t): any vector <; in the camera frame corresponds to the vector 
q<;q* in the reference frame 1Z using the identification of vectors as imaginary 
quaternions. We have thus: j^q — \qu. A pixel is labeled by the unit vector 
77 in the camera frame: ij belongs to the sphere S 2 and receives the brightness 
y{t,rf). Thus at each time t, the image produced by the camera is described by 
the scalar field S 2 3 rj ^ y(t, 77) G M. 

The scene is modeled as a closed, C 1 and convex surface E of K 3 , diffeomor- 
phic to § 2 . The camera is inside the domain ft C R 3 delimited by E = dft. To a 
point M G S corresponds one and only one camera pixel: if the points of S are 
labeled by s G S 2 , for each time t, a continuous and invertible transformation 
§ 2 3sh> <j)(t, s) G § 2 enables to express ijasa function of s: T) = 4>{t, s). 

The density of light emitted by a point M(s) G E does not depend on the 
direction of emission (E is a Lambertian surface) and is independent of t (the 
scene is static). This means that y(t,r)) depends only on s: thus y can be seen 
either as a function of (t, rj) or, via the transformation cj>, as a function of s. 
The distance C(t)M(s) between the optical center and the object seen in the 
direction r\ — <j>(t, s) is denoted by D(t,r)), and its inverse by T = 1/D. FigQ] 
illustrates the model and the notations. We assume that s H> y(s) is a C 1 
function. For each t, s h4 D(t, s) is C 1 since E is a C 1 surface of R 3 . 




Figure 1: Model and notations of a spherical camera in a static environment. 



Under these assumptions, we first have 
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where h is any scalar field defined on § 2 and V/i its gradient with respect to 
the Riemannian metric on § 2 . The value of V/i at 77 6 § 2 is identified with a 
vector of R 3 tangent to the sphere at the point 77 also identified to a unitary 
vector of R 3 in the camera moving frame. The Euclidean scalar product of two 
vectors a and b in R 3 is denoted by a • b and their wedge product by a x b. By 

differentiation, the identity qrjq* — ^c^(t)M(s 
77 is identified to an imaginary quaternion, yields 



where * denotes conjugation and 
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since the vector 77 x oj corresponds to the imaginary quaternion (urj — r)U))/2. 
Therefore, the intensity y(t, 77) and the inverse depth F(t, 77) satisfy the following 
equations: 

^ = ~Vy( v x {lo + T V xv)) (3) 



dt 
dT 

— = -Vr • (77 x (lj + T?7 x v j) + T 2 v ■ 77 
dt 



(4) 



Equations ([3]) and Q are 50(3)-invariant: they remain unchanged by any rota- 
tion described by the quaternion a and changing (77, w, v) to (arja* , <juj<j* , aver*). 
Equation ([3]) is the well-known optical flow equation that can be found under 
different forms in numerous papers (see |18) or [13] for example), while (jj) is 
less standard (see e.g., [T7]V 



2.2 The system in pinhole coordinates 

To use this model with camera data, one needs to write the invariant equa- 
tions ([3]) and (fj| with local coordinates on S 2 corresponding to a rectangular 
grid of pixels. One popular solution is to use the pinhole camera model, where 
the pixel of coordinates (21,22) corresponds to the unit vector 7/ e S 2 of co- 

ordinates in R 3 : (l + z\ + z|) (zi, z%, 1) T . The optical camera axis (pixel 
(21,22) = (0,0)) corresponds here to the direction 23. Directions 1 and 2 corre- 
spond respectively to the horizontal axis from left to right and to the vertical 
axis from top to bottom on the image frame. 

The gradients Vy and Vr must be expressed with respect to z\ and zi . Let 
us detail this derivation for y. Firstly, Vy is tangent to S 2 , thus Vy • 77 = 0. 



Secondly, the differential dy corresponds to Vy • drj and to -^dzi + J^-dz2. By 
identification, we get the Cartesian coordinates of Vy in M 3 . Similarly we get 
the three coordinates of VI\ Injecting these expressions in $Ji§ and (U]), we get 
the following partial differential equations (PDE) corresponding to © and 
in local pinhole coordinates: 
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+ T 2 (ziVi + z 2 v 2 + v 3 ) 

where v\, v 2 , v 3 , wi, uj 2 , uj 3 are the components of linear and angular velocities 
in the camera frame. 



3 Depth estimation inspired by Horn-Schunck 
method 

3.1 The Horn-Schunck variational method 

In [T5], Horn and Schunck described a method to compute the optical flow, 
defined as "the distribution of apparent velocities of movement of brightness 
patterns in an image" . The entire method is based on the optical flow constraint 
written in the compact form 



Identification with ((3} yields 



with 



Vi(t, z) = h(z,iu(t)) + r(t, z) 9l (z, v(t)) 
V 2 (t, z) = f 2 (z,u(t)) + r(t, z)g 2 {z, v(t)) 

/l (Z,U>) = ZlZ 2 UJl - (1 + ZX 2 )UJ2 + Z 2 OJ 3 



gi(z, v) = yd + zf + ~z\{-v x + zxv 3 ) 

fa(z,U)) = (1 + Z 2 2 )UJ1 - Z!Z 2 UJ2 ~ ZiUJ 3 

52(2, v) = \Jl + zf + z 2 (-v 2 + z 2 v 3 ). 
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For each time t, the apparent velocity field V = V\ + ^2g§^ is then estimated 



by minimizing versus W = W\ + W2 the following cost (the image X is a 



rectangle of R here) 



a 2 (VW! 2 + VW 2 2 ) )d Zl dz 2 (8) 



where V is the gradient operator in the Euclidian plane (21,22), a > is a 
regularization parameter and the partial derivatives -gj? , J^- and Jj^ are assumed 
to be known. 

Such Horn-Schunk estimation of V at time t is denoted by 
V HS (t,z) = V asl (t,z)— + V lls2 {t,z)—. 
For each time t, usual calculus of variation yields the following PDE's for V HS : 

%V T/ dy dy 2 dy dy 

-5— Vhsi + "5 — "5 — ^hs2 = « AVhsi - -5 — ;rr 
ozi ) ozi oz 2 ozi at 

dy dy f dy \ 2 2 9y 

^— -5— V HS1 + ^— ^hs 2 = " AVhs2 " t— -rrr 
ozi 0Z2 \ oz 2 J 0Z2 at 

with boundary conditions 8 g" si = 3 g" S2 = (n the normal to 91). Here A is 
the Laplacian operator in the Euclidian space (zi, Z2). The numerical resolution 
is usually based on 

• computations of J^- , Jj- and ^ via differentiation filters (Sobel filtering) 
directly from the image data at different times around t. 

• approximation of AVh S1 and AVhs2 by the difference between the weighted 
mean V HS i and Vh S2 of V HS1 and V HS 2 on the neighboring pixels and their 
values at the current pixel; 

• iterative resolution (Jacobi scheme) of the resulting linear system in V HS i 
and Vhs2- 

The convergence of this numerical method of resolution was proven in |20j . 
Three parameters have a direct impact on the speed of convergence and on 
the precision: the regularization parameter a, the number of iterations for the 
Jacobi scheme and the initial values of Vhsi and V nS 2 at the beginning of this 
iteration step. To be specific, a should neither be too small in order to filter 
noise appearing in differentiation filters applied on y, nor too large in order to 
have V ss close to V when W ^ 0. 



3.2 Adaptation to depth estimation 

Instead of minimizing the cost I given by flSJ) with respect to any W\ and W 2 , 
let us dehnc a new invariant cost J, 



J(T) = IS j ( + Vy ■ (?? x (w + Tv x 



2 



VT 2 W, (9) 



and minimize it with respect to any depth profile J 3 rj h-> T(t, rj) G M. The 
time i is fixed here and da v is the Riemannian infinitesimal surface element on 
S 2 . J C § 2 is the domain where y is measured and a > is the regularization 
parameter. 

The first order stationary condition of J with respect to any variation of T 
yields the following invariant PDE characterizing the resulting estimation r HS 
ofT: 

a 2 Ar HS = (j£ + Vy • (r? X (w + r HS? ; x v)) 

. . . (Vy • (rj x (jj x v))) on J (10) 

^ = on dj (11) 

on 

where Ar HS is the Laplacian of r HS on the Riemannian sphere § 2 and dj is the 
boundary of J ^ assumed to be piece-wise smooth and with unit normal vector 
n. 

In pinhole coordinates (21,2:2), we have 
d<T v — (l + 2 2 + 2 2 ) 3 ^ 2 dz\dz% 

^2 /-1 2 2n /5T 2 <9T 2 , ar <9T N , 
^ + Vy ■ (r, x (« + T V x -(F + TG) 2 
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Consequently, the first order stationary condition (fTU)) reads in (z ll z 2 ) coordi- 
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on the rectangular domain I = [— z\, Zi] X [— Z2, z%] {z~\,Z2 > with zf + zl < 1). 

The right term of (fT3l) corresponds to the Laplacian operator on the Rie- 
mannian sphere § 2 in pinhole coordinates. The numerical resolution of this 
scalar diffusion providing the estimation r HS of T is similar to the one used 
for the Horn-Schunck estimation V HS of V. The functional I(W) dehned in (|8j) 
is minimized with respect to two varying parameters W\ and W2 while there 
is really only one unknown function in this problem: the depth field T. On 
the contrary, the functional J(T) takes full advantage of the knowledge of the 
camera dynamics since the only varying parameter here is T. 



4 Depth estimation via asymptotic observers 
4.1 Asymptotic observer based on optical flow measures 

(Ks) 

From any optical flow estimation, such as V HS , it is reasonable to assume that 
we have access for each time t, to the components in pinhole coordinates of the 
vector field 

■cu t ■ S 2 9 77 H> vj t {ri) = i] x (uj + Trj x v) G T^S 2 (14) 

appearing in (|4}. This vector field can be considered as a measured output 
for (j4]), expressed as w t (rj) = ft(r]) + T(t, rj)gt{rj), where f t and g t are the vector 
fields 

f t : § 2 3^ Mr]) = V XLue T I; § 2 (15) 
gt: § 2 3 V h+ g t (v) = V x (v X ») G T V S 2 . (16) 

This enables us to propose the following asymptotic observer for D = 1/T since 
it obeys to ^ = — VD ■ vo t — v ■ rj: 

db - 

— = -V£> -zjt-v-rj + kgt- (Df t +g t - Dw t ) (17) 



where zut, ft and gt are known time- varying vector fields on § 2 and k > is 
a tuning parameter. This observer is trivially SO(3) invariant and reads in 
pinhole coordinates: 



3D 3D 3D, r . , 
~dt = -3^ Vl ~ 3^ V2 ~ {Z1V1 + Z2V2 + V3) (18) 
+ k{ gi {Df x + 9i - DV X ) + g 2 (Df 2 + g 2 - DV 2 )) 

where V is given by any optical flow estimation and (/i, f 2 , g\, g 2 ) are defined 

by©. 

As assumed in the first paragraphs of subsection 12. 1[ for each time t, there 
is a one to one smooth mapping between r\ £ S 2 attached to the camera pixel 
and the scene point M(s) corresponding to this pixel. This means that, for any 
t > 0, the flow 4>(t, s) defined by 



Of 



= m t ((t>(t,s)), 0(O, S ) = s6§ 2 (19) 

(M) 



defines a time varying diffeomorphism on § 2 . Let us denote by _1 the inverse 
diffeomorphism: 4>(t,(j)~ 1 (t,ri)) = rj. Assume that T(t,i]) > 0, v(t) and uj(t) 
are uniformly bounded for t > and i] £ E> 2 . This means that the trajectory 
of the camera center C(t) remains strictly inside the convex surface S with 
minimal distance to S. These considerations motivate the assumptions used in 
the following theorem. 

Theorem 1. Consider T(t,r]) associated to the motion of the camera inside the 
domain Q delimited by the scene E 7 a C 1 , convex and closed surface as explained 
in sub-section \2. 1\ Assume that exist v > 0, ui > 0, 7>0 and T > such that 

Vi > 0, V?7 £ § 2 , \v(t)\ < v, \uj{t)\ < Q, 7 < r(i,r?) < f. 

Then, for t > 0, T(t, 77) is a C 1 solution of Q. Consider the observer (|17|) 
with a C 1 initial condition versus rj, D(0,n). Then we have the following im- 
plications: 

• \/t > 0, the solution D(t,rf) of (|17p exists, is unique and remains C 1 
versus n. Moreover 

t^ \\D(t,.) - D(t,.)\\ L co =max D(t,r)) - D{t,rj) 

i?es 2 

is decreasing (L°° stability). 

• if additionally for all s £ S 2 , J +o ° \\g T ((j)(T, s))\\ 2 dr — +00, then we have 
for all p > 0, 

lim / \D(t, n) - D(t, n) I p do-„ = 
(convergence in any V topology) 



• if additionally there is X > and T > such that, for all t > T and 
s € S 2 , J ||g T (0(r, s)) || 2 c?t > \t, then we have, for all t>T, 

||%.) - D(t,.)|U- < e-^ At ||5(0,.) - £>(0,.)|U~ 

(exponential convergence in L°° topology). 

Assumptions on J \\g t (4>(t, s))\\ 2 dt can be seen as a condition of persistent 
excitations. It should be satisfied for generic motions of the camera. 

Proof. The facts that v, uj and T are bounded and that the scene surface £ 
is C , closed and convex, ensure that the mapping 77 = <fi(t,s) and its inverse 
s = rj) are C 1 diffeomorphism on § 2 with bounded derivatives versus 

s and T] for all time t > 0. Therefore, T is also a function of (t,s). Set 
r(f, s) = T(t, <p(t, s)): in the (t, s) independent variables the partial differen- 
tial equation (H| becomes a set of ordinary differential equations indexed by s: 

|G = r 2 7j(i) • 0(i,s) that reads also ^± = -v(t) ■ <j>(t,s) with D = l/T. 

s s 

Thus D(t, s) — D(0, s) = — J Q v(t) ■ <f>(r, s) dr. Consequently, T is C 1 versus s 
and thus T is C 1 versus 77. 

Set D(t, s) = D(t, <j>(t, s)). Then 

dD . , . . . 

— =-v(t)-<t>(t,s) 

S 

+ fellft^t, S ))fr(t, s)(D(t, s) - D(t, s)). 
Set E = D - D imdE = D -D. Then 

-kT(t,s)\\g t ((f>(t,s))\\ 2 E(t,s) (20) 



~dt 



Consequently, E is well defined for any t > and C 1 versus s. Thus E and 
consequently D = E + D are also well defined for alH > and are C 1 versus ij. 
Since for any s and £2 > ti > we have \E(t2, s)\ < \E(ti, s)\, we have also 

\E(h,s)\ < max\E(ti,a)\ = - 

(7 

Thus, taking the max versus s, we get 

\\D(t 2 ,.) - I>(*3,.)lk- < - £>(*i,.)IU- 

Since 

=^(0, S )e-^o (r(r,.)llffrW(r,.))|| a )«»r 

we have (</>(0, s) = s). 

\E(t, S )\ < \E(0,s)\e- k *<ti W9A4>(r,s))f dr_ 



Take p > 0. Then 



S 2 Js 2 



E(t,T])\ y da ri = I \E(t,s)\' J det[^-{t,s))da 



By assumption ^ is bounded. Thus exists C > such that 

\\E(t,.)\\ LP = [ \E(t, V )\ P da v <C [ \E(t,s)\ p da s . 
Js 2 Js 2 

When J +00 \\g T (4>(T,s))\\ 2 dT = +00, for each s we have limt,_> +00 E(t, s) = 
0. Moreover \E(t, s)\ is uniformly bounded by the L°° function E(0, s). By 
Lebesgue dominate convergence theorem lim tl _^ +00 ||2?(i,.)[[£j> = 0. Previous 
inequality leads to lim tl _> +00 ||.E(t,.)||£p = 0. 

When, for t > T, ^ \\g T {(j){T,s))\\ 2 dT > At, we have, for all s E S 2 , 
\E(t,s)\ < \E(0,s)\e~ k ^ xt . Thus, for all s £ S 2 we get \E(t,s)\ < ||J5(0,.)||i«>. 
Since 77 — 1 (£, 77) is a diffeomorphism of § 2 , we get finally, for all i) £ S 2 , 
\E(t,n)\ < ^(C^llLoce-^ 4 . This proves \\E(t,.)\\ L - < \\E{0,.)\\ L ~e- k ^ xt . 

□ 

4.2 Asymptotic observer based on rough depth estimation 

(r HS ) 

Instead of relying the observer on estimation Vhsi we can base it on r HS - 
Then ([17]) becomes (fc > 0) 

— - -VD ■ (f t + r HS .g*) - v -r) + k{l- DT HS ) (21) 
that reads in pinhole coordinates 

-(ziV 1 + z 2 v 2 + v 3 )+k(l-DT us ). (22) 
For this observer we have the following convergence result. 

Theorem 2. Take assumptions of theorem^ concerning the scene surface £, 
r = 1/ D, v and u). Consider the observer (|21[) where T HS coincides with T and 
where the initial condition is C 1 versus rj. Then Vt > ; the solution D(t,rj) 
of (1211) exists, is unique, remains C 1 versus r\ and 

||%,.) - D(t,.)\\ L oc < e- k ^\\D(0,.) - D(0,.)||l- 

(exponential convergence in L°° topology) 

The proof, similar to the one of theorem [TJ is left to the reader. 



5 Simulations and numerical implementations 



5.1 Sequence of synthetic images and method of compar- 
ison 

The non-linear asymptotic observers described in section [4] are tested on a se- 
quence of synthetic images characterized by the following: 

• virtual camera: the size of each image is 640 by 480 pixels, the frame rate 
of the sequence is 60 Hz and the field of view is 50 deg by 40 deg; 

• motion of the virtual camera: it consists of two combined translations in 
a vertical plane (i>3 = u>i = L02 = ^3 = 0), and the velocity profiles are 
sinusoids with magnitude 1 m.s -1 , and different pulsations (it for v\ and 
37r for V2)', 

• virtual scene: it consists of a 4 m 2 -plane placed at 3 m and tipped of an 
angle of 0.3 rad with respect to the plane of camera motion; the observed 
plane is virtually painted with a gray pattern, whose intensity varies in 
horizontal and vertical directions as a sinusoid function; 

• generation of the images: each pixel of an image has an integer value 
varying from 1 to 256, directly depending on the intensity of the observed 
surface in the direction indexed by the pixel, to which a normally dis- 
tributed noise varying with mean and standard deviation a is added. 

The virtual setup used to generate the sequence of images is represented in 
FigH 

To compare the performances of both methods, we use the global error rate 
in the estimation of D, defined as 

E= j ^% D ^ da v l J^da v (23) 

where D is the true value of the depth field, D is the estimation computed by 
any of the proposed methods and I is the image frame. 

5.2 Implementation of the depth estimation based on op- 
tical flow measures (Vhs) 

We test on the sequence described in 15.11 the depth estimation characterized 
by the partial differential equation flT3"l) . The optical flow input V HS (Vi and 
V% components) is computed by a classical Horn-Schunck method. Note that 
convergence theorem [T] assumes that the domain of definition of the image was 
the entire unit sphere S 2 . Here the field of view of our virtual camera limits this 
domain to a portion of the sphere X. However, the motion of our virtual camera 
ensures that most of the points of the scene appearing in the first image stay in 
the field of view of the camera during the whole sequence. The convergence of 



2m 




Figure 2: Virtual setup used to generate the sequence of images processed in [5] 



the method is only ensured for these points, and Neumann boundary conditions 
are chosen at the borders where optical flow points toward the inside of the 
image: 

3D 

— = if n ■ V HS < (24) 

The observer gain k > is chosen in accordance with scaling considerations. 
Setting k = 500 s.m~ 2 provides a rapid convergence rate: we see on Fig. [3] 
that after a few frames, the initial relative error (blue curve) is reduced by 1/3. 
Setting k = 50 s.m~ 2 is more reasonable when dealing with noisy data: on 
Fig. S] initial relative error is reduced by 1/3 after around 20 frames. 

More precisely, the standard deviation a of the noise added to the synthetic 
sequence of images is 1. The gains k = 500 and k = 100 are successively tested, 
and the associated error rates for D are plotted in Fig. [3] As expected, the 
convergence is more rapid for a larger gain but at convergence, i.e., after 40 
frames, the relative errors are similar and below 1.5%. 

To test robustness when dealing with noisy data, the standard deviation a is 
magnified by 20. The correction gain is tuned to k = 50 s.m~ 2 . The converged 
errors after 40 frames significantly increases and yet stays between 12 and 14 
%. Note that such permanent errors can not decrease since such noise level 
first affects the optical flow estimation V HS that feeds the asymptotic observer. 
Compared to its true value V, the error level for V HS is about 15 %. These 
results underline the fact that this approach is sensitive to input optical flow 
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Figure 3: Relative estimation errors of the depth field D estimated by the 
asymptotic observer (|18|) filtering the optical flow input V HS obtained by Horn- 
Schunck method, for different correction gains k. The noise corrupting the image 
data is normally distributed, with mean (i — and standard deviation a = 1. 



measures, but not directly to noise corrupting the image data. 
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Figure 4: Relative estimation errors of the depth field D estimated by the 
asymptotic observer (fT5|) filtering the optical flow input V^s obtained by Horn- 
Schunck method. The noise corrupting the image data is normally distributed, 
with mean /i = and standard deviation a = 20. 



5.3 Implementation of the asymptotic observer based on 
rough depth estimation (T HS ) 

Subsequently, the observer described by (|22|) was applied to the same sequence. 
The input depth field T as is obtained as the output of (|13|) . To adapt the numer- 
ical method to this model, we make the small-angle approximation by neglecting 
the second order terms z (we neglect the curvature of S 2 and consider that the 
camera image corresponds to a small part of § 2 that can be approximated by 
a small Euclidean rectangle; the error of this approximation is smaller than 3% 
for such rectangular image): (jT3"]) becomes 

G2r+re =» 2 (0+0) < 25 > 

F and G are computed using angular and linear velocities, u and v, and differ- 
entiation (Sobel) filters directly applied on the image data y(t,zi,Z2) ■ AT is 



approximated by the difference between the weighted mean f of T on the neigh- 
boring pixels and its value at the current pixel. The resulting linear system in T 
is solved by the Jacobi iterative scheme, with an initialization provided by the 
previous estimation. The regularization parameter a is chosen accordingly to 
scaling considerations and taking into account the magnitude of noise: a = 300 
m.s^ 1 provides a convergence in about 5 or 6 frames for relatively clean image 
data. As for the observer (|22l) . the correction gain k = 50 s.m -1 enables a 
convergence in around 20 frames. 

As in section 15.21 we test the observer (f2"2"j) for different levels of noise cor- 
rupting the image data. For a = 1, the error rates associated to the input 
depth r HS and to the estimated depth D are plotted in Fig. [5] After only 6 
images of the sequence, the error rate for T HS is smaller than 4% and stays below 
this upper bound for the rest of the sequence. On the downside, the error rate 
stays larger than 2.5%. On the contrary, the error rate associated to D keeps 
decreasing, and reaches the minimal value of 0.5 %. 




Figure 5: Relative estimation error of T as (blue), using the depth estimation 
inspired by Horn-Schunck, described in 13.21 and of D (red) estimated by the 
asymptotic observer (|22|) filtering r HS .The noise corrupting the image data is 
normally distributed, with mean fi = and standard deviation a = 1. 

For a = 20, the error rates associated to the input depth T HS and to the 
estimated depth D are plotted in Fig. [51 For the computation of r HS , the 



diffusion parameter a is increased to 1000 m.s -1 to take into account such 
stronger noise. The observer filters the error associated to T HS (between 4 and 
8 %) to provide a 3 % accuracy. The results show a good robustness to noise 
for this observer. 
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Figure 6: Relative estimation error of F HS (blue), using the depth estimation 
inspired by Horn-Schunck, described in 13.21 and of D (red) estimated by the 
asymptotic observer (|2"2"|) filtering r HS .The noise corrupting the image data is 
normally distributed, with mean fi = and standard deviation a = 20. 



5.4 Experiment on real data 

To realize the experiments, a camera was fixed on a motorized trolley traveling 
back and forth on a 2 meter-linear track in about 6 seconds. The resolution of the 
encoder of the motor enables to know the position and the speed of the trolley 
with a micrometric precision. The camera is a Flea2 - Point Grey Research 
VGA video cameras (640 by 480 pixels) acquiring data at 20.83 fps, with a 
Cinegon 1.8/4.8 C-mount lens, with an angular field of view of approximately 
50 by 40 deg, and oriented orthogonally to the track. The scene is a static 
work environment, with desks, tables, chairs, lamps, lit up by electric light 
plugged on the mains, with frequency 50 Hz. The acquisition frame rate of 
the cameras produces an aliasing phenomenon on the video data at 4.17 Hz. 
In other words, the light intensity in the room is variable, at a frequency that 
can not be easily ignored, which does not comply with the initial hypothesis 



However, the impact of this temporal dependence in the equations can be 
reduced by a normalization of the intensity of the images such as y(x, y) = y ^ y ^ 
where x and y are the horizontal and vertical indexes of the pixels in the image, 
y(x, y) is the intensity of this pixel and y is the mean intensity on the entire 
image. 

The depth field was estimated via the asymptotic observer (fT8|) based on 
optical flow measures. The components of the optical flow were computed using 
a high quality algorithm based on TV-L 1 method (see [21] for more details). 
The correction gain was tuned to k — 100 s.m~ 2 . An example of image data 
is shown in Figj7l and the depth estimate associated to that image at the same 
time t* is shown in FigJBJ At that specific time t* « 8 s, the trolley already 
traveled once along the track and is on its way back toward its starting point. 
Some specific estimates are extracted from the whole depth field (two tables, 
two chairs, a screen, two walls) and highlighted in black; they are compared 
to real measures taken in the experimental room (in red): the estimate depth 
profile D(t* , ■) exhibits a strong correlation with these seven punctual reference 
values of D(t*, •); the global appearance of the depth field looks very realistic. 




Figure 7: The static scene as seen by the camera 




Figure 8: Estimation of the depth field associated with the image shown in 
Fig 13 Depth is associated to a gray level, whose scale in meters is on the right. 
Some estimates are extracted from the entire field (in black) and compared to 
real measures (in red). 



6 Conclusions and future works 



In section[2] we recalled a system of partial differential equations, describing the 
invariant dynamics of brightness and depth smooth helds under the assumptions 
of a static and Lambertian environment. We proposed in section[3]an adaptation 
of optical flow algorithms that take the best advantage of the 5'0(3)-invariance 
of these equations and the knowledge of camera dynamics: it yielded an SO(3)- 
invariant variational method to directly estimate the depth held. In section [H 
we proposed two asymptotic observers, respectively based on optical flow and on 
depth estimates. We proved their convergence under geometric and persistent 
excitation assumptions. On synthetic images, we showed in section [5] that the 
variational method converges rapidly, but its performance is highly dependent 
of the noise level whereas both asymptotic observers filter this noise. These 
asymptotic obersevers based on image processing of the entire field of view of 
the camera seems to be an interesting tool to dense range estimation and a 
complement to methods based on feature tracking. 
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